Dialect classification via discriminative training
نویسندگان
چکیده
Variability in speech due to dialect is a major factor limiting speech system performance for speech recognition, spoken document retrieval, and dialog systems. In this study, we propose a novel discriminative algorithm to improve dialect classification for unsupervised spontaneous speech in Arabic. No transcripts are used for either training or testing, and all data are spontaneous speech. The Gaussian mixture model (GMM) is used as our baseline system for dialect classification. The major motivation is to remove confused/distractive regions from the dialect acoustic space, while emphasizing discriminative/sensitive information. The Kullback-Leibler divergence is used to find the most discriminative GMM mixtures (KLD-GMM), after which the confused acoustic GMM region is removed. The proposed algorithm is evaluated on three dialects of Arabic, with measurable improvement achieved (4.0%), over a generalized maximum likelihood estimation GMM baseline (MLE-GMM) system.
منابع مشابه
Discriminative n-gram selection for dialect recognition
Dialect recognition is a challenging and multifaceted problem. Distinguishing between dialects can rely upon many tiers of interpretation of speech data—e.g., prosodic, phonetic, spectral, and word. High-accuracy automatic methods for dialect recognition typically use either phonetic or spectral characteristics of the input. A challenge with spectral system, such as those based on shifted-delta...
متن کاملGaussian Mixture Selection and Data Selection for Unsupervised Spanish Dialect Classification
Automatic dialect classification has gained interests in the field of speech research because it is important to characterize speaker traits and to estimate knowledge that could improve integrated speech technology (e.g., speech recognition, speaker recognition). This study addresses novel advances in unsupervised spontaneous Latin American Spanish dialect classification. The problem considers ...
متن کاملAdvances in Word based Dialect/
In an earlier study, we proposed a very effective dialect/accent classification algorithm, which is named Word based Dialect Classification (WDC). The WDC works well for large size corpora and significantly outperforms traditional Large Vocabulary Continuous Speech Recognition (LVCSR) based systems, which is claimed to be the best performing system for language identification. For a small train...
متن کاملدو روش تبدیل ویژگی مبتنی بر الگوریتم های ژنتیک برای کاهش خطای دسته بندی ماشین بردار پشتیبان
Discriminative methods are used for increasing pattern recognition and classification accuracy. These methods can be used as discriminant transformations applied to features or they can be used as discriminative learning algorithms for the classifiers. Usually, discriminative transformations criteria are different from the criteria of discriminant classifiers training or their error. In this ...
متن کاملWord-Based Dialect Identification with Georeferenced Rules
We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character...
متن کامل